397 research outputs found
Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment
Intelligibility is widely used to measure the severity of articulatory problems in pathological speech. Recently, a number of automatic intelligibility assessment tools have been developed. Most of them use automatic speech recognizers (ASR) to compare the patient's utterance with the target text. These methods are bound to one language and tend to be less accurate when speakers hesitate or make reading errors. To circumvent these problems, two different ASR-free methods were developed over the last few years, only making use of the acoustic or phonological properties of the utterance. In this paper, we demonstrate that these ASR-free techniques are also able to predict intelligibility in other languages. Moreover, they show to be complementary, resulting in even better intelligibility predictions when both methods are combined
Topic spotting using subword units
In this paper we present a new approach for topic spotting based on subword units and feature vectors instead of words. In our first approach, we only use vector quantized feature vectors and polygram language models for topic representation. In the second approach, we use phonemes instead of the vector quantized feature vectors and model the topics again using polygram language models. We trained and tested the two methods on two different corpora. The first is a part of a media corpus which contains data from TV shows for three different topics. The second is the VERBMOBIL-corpus where we used 18 dialog acts as topics. Each corpus was splitted into disjunctive test and training sets. We achieved recognition rates up to 82% for the three topics of the media corpus and up to 64% using 18 dialog acts of the VERBMOBIL-corpus as topics
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
Stuttering is a varied speech disorder that harms an individual's
communication ability. Persons who stutter (PWS) often use speech therapy to
cope with their condition. Improving speech recognition systems for people with
such non-typical speech or tracking the effectiveness of speech therapy would
require systems that can detect dysfluencies while at the same time being able
to detect speech techniques acquired in therapy. This paper shows that
fine-tuning wav2vec 2.0 [1] for the classification of stuttering on a sizeable
English corpus containing stuttered speech, in conjunction with multi-task
learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features
for detecting stuttering in speech; both within and across languages. We
evaluate our method on FluencyBank , [2] and the German therapy-centric Kassel
State of Fluency (KSoF) [3] dataset by training Support Vector Machine
classifiers using features extracted from the finetuned models for six
different stuttering-related event types: blocks, prolongations, sound
repetitions, word repetitions, interjections, and - specific to therapy -
speech modifications. Using embeddings from the fine-tuned models leads to
relative classification performance gains up to 27% w.r.t. F1-score.Comment: Accepted at Interspeech 202
Syntactic-prosodic labeling of large spontaneous speech data-bases
In automatic speech understanding, the division of continuously running speech into syntactic chunks is a great problem. Syntactic boundaries are often marked by prosodic means. For the training of statistic models for prosodic boundaries large databases are necessary. For the German Verbmobil project (automatic speech-to-speech translation), we developed a syntactic-prosodic labeling scheme where two main types of boundaries (major syntactic boundaries and syntactically ambiguous boundaries) and some other special boundaries are labeled for a large Verbmobil spontaneous speech corpus. We compare the results of classifiers (multilayer perceptrons and language models) trained on these syntactic-prosodic boundary labels with classifiers trained on perceptual-prosodic and pure syntactic labels. The main advantage of the rough syntactic-prosodic labels presented in this paper is that large amounts of data could be labeled within a short time. Therefore, the classifiers trained with these labels turned out to be superior (recognition rates of up to 96%)
- …